Managing Provenance in Scientific Workflows with ProvManager

نویسندگان

  • Anderson Marinho
  • Leonardo Murta
  • Cláudia Werner
  • Vanessa Braganholo
  • Sérgio Manuel Serra da Cruz
  • Eduardo Ogasawara
  • Marta Mattoso
  • Celso Suckow da Fonseca
چکیده

Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow activity should collect its own provenance data and publish them in a repository which scientists can access to make their queries. In this paper we show how provenance is captured along distributed heterogeneous systems. Two main strategies are used to capture provenance: using Prolog predicates to register provenance, and using an API for the communication between the wrapped activity and the ProvManager.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges in Managing Implicit and Abstract Provenance Data: Experiences with ProvManager

Running scientific workflows in distributed and heterogeneous environments has been motivating the definition of provenance gathering approaches that are loosely coupled to workflow management systems. We have developed a provenance management system named ProvManager to manage provenance data in distributed and heterogeneous environments independent of a specific Scientific Workflow Management...

متن کامل

Integrating Provenance Data from Distributed Workflow Systems with ProvManager

Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow execution engine. This kind of approach is interesting because it allows both storage and access to provenance data in an integrated way, even in an environment where different workflow management systems work together. Therefore, we h...

متن کامل

Isolation Levels for Data Sharing in Large-Scale Scientific Workflows

Scientists can benefit from Grid and Cloud infrastructures to face the increasing need to share scientific data and execute data-intensive workflows at a large scale. However, these workflows are creating more and more challenging problems in the automation of data management during execution. Existing workflow management systems focus on how data is stored, transfered and on data provenance. H...

متن کامل

Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs

While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully a...

متن کامل

Managing Rapidly-Evolving Scientific Workflows

We give an overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process. A key feature that sets VisTrails apart from previous visualization and scientific workflow systems is a novel action-based mechanism that uniformly captures provenance for data products and workflows used to generate these pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010